Effects of diacritics on Turkish information retrieval

نویسندگان

  • Adil ALPKOÇAK
  • Meltem CEYLAN
چکیده

We investigate the effects of improper use of diacritics in the Turkish alphabet on information retrieval. A diacritic is simply a supplementary sign added to a letter to change the sound value of the letter, and the Turkish alphabet has 5 special letters derived from Latin by adding different diacritics. The statistical analysis performed in this study shows that retrieval performance significantly decreases when documents and queries contain letters with different forms, such that documents consist of letters with diacritics while queries consist of standard Latin letters and vice versa. In order to tackle this challenge, we propose 3 approaches: token normalization by equivalence classes, document expansion, and query expansion. The experimental evaluations carried on the Bilkent Turkish information retrieval test collection suggests that the proposed approaches are promising as a remedy in this line of research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information retrieval on Turkish texts

We study information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language dependent corpus statistics, and an elaborat...

متن کامل

Experiments to Improve Named Entity Recognition on Turkish Tweets

Social media texts are significant information sources for several application areas including trend analysis, event monitoring, and opinion mining. Unfortunately, existing solutions for tasks such as named entity recognition that perform well on formal texts usually perform poorly when applied to social media texts. In this paper, we report on experiments that have the purpose of improving nam...

متن کامل

Information Retrieval Effectiveness of Turkish Search Engines

This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular inform...

متن کامل

Studying Users’ Emotions Attribution Style in Information Retrieval Based on Weiner’s Emotion Attribution Theory

Background and Aim: This research aimed to study emotions attribution style of users in information retrieval based on Weiner's theory. Methods: The survey method was used in this study. Population consisted of graduate students in humanities at Imam Reza (AS) International University. Sample of 72 students was selected.  Data was collected by attribution style questionnaire (ASQ) and two resea...

متن کامل

Factors Affecting Student's Scientific Information Retrieval based on Fuzzy Logic Method Compared to Traditional Method

Background and aim: The aim of this study was to identify the factors affecting on students' performance in information retrieval based on fuzzy logic method compared to traditional method. Materials and methods: This survey-descriptive study was performed using quantitative approach. The research population was 34 PhD students, and the researcher-made questionnaire was used. Data were analyzed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012